Chromatin Immunoprecipitation Sequencing    ◾    221

samtools view -S -b ENCFF000XJP_chp1_filt.sam > ENCFF000XJP_chp1_

filt.bam

samtools view -S -b ENCFF000XJS_chp2_filt.sam > ENCFF000XJS_chp2_

filt.bam

samtools view -S -b ENCFF000XKD_chp3_filt.sam > ENCFF000XKD_chp3_

filt.bam

The BAM file takes less storage space. Then, we can delete the SAM file to save some storage

space if we need to. Just be careful not to delete the BAM files.

Now, we have three BAM files for the three ChIP-Seq data and one file for the control

data. Before proceeding, we need to know the number of alignments in each file and then

draw a sample of control reads approximately equal to the reads of any of the ChIP-Seq

files to be the input reads for that ChIP-Seq file. We do that to avoid library coverage bias.

The following “samtools view” commands count the alignments in each BAM file:

samtools view -c ENCFF000XGP_inp0_filt.bam

samtools view -c ENCFF000XJP_chp1_filt.bam

samtools view -c ENCFF000XJS_chp2_filt.bam

samtools view -c ENCFF000XKD_chp3_filt.bam

Table 6.1 shows the number of aligned reads in each BAM file and the factor, which is the

read count of a ChIP-Seq file divided by the read count of the control file. This fraction is

used to sample input reads from the control file for that ChIP-Seq file.

The following commands store the counts in bash variables and then use “samtools

view” command to draw a subsample of reads from the control file and store them in a

separate control file for that ChIP-Seq file. The “-b” option is to output a BAM file and “-s”

option is to draw a subsample from the file.

inpc=$(samtools view -c ENCFF000XGP_inp0_filt.bam)

chp1=$(samtools view -c ENCFF000XJP_chp1_filt.bam)

fact1=$(echo “scale=6; $chp1/$inpc” | bc)

samtools view -b -s $fact1 ENCFF000XGP_inp0_filt.bam >

ENCFF000XGP_inp0_filt_inp1.bam

TABLE 6.1  Read Count in Each BAM File, the Fraction for Sampling Reads from the Control BAM file, and

Number of Reads in the Control File for Each ChIP-Seq File

Sample

Read Count

Sampling Factor

Control Read Count

ENCFF000XGP_inp0_filt.bam

30,923,163

N/A

N/A

ENCFF000XJP_chp1_filt.bam

8,942,010

0.289168673

8,941,151

ENCFF000XJS_chp2_filt.bam

12,748,871

0.412275775

12,744,729

ENCFF000XKD_chp3_filt.bam

13,217,349

0.427425519

13,212,672